Installs and reads file. Loads packages: ‘plyr’, ‘dplyr’, ‘maps’, ‘ggplot2’, ‘ggmap’, ‘mapproj’, ‘scales’, ‘RColorBrewer’.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'maps'
## The following object is masked from 'package:plyr':
##
## ozone
## -------------------------------------------------------------------------
## data.table + dplyr code now lives in dtplyr.
## Please library(dtplyr)!
## -------------------------------------------------------------------------
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, last
Which stations see the more departures than arrivals?
## [1] 8 Ave & W 31 St
## 419 Levels: 1 Ave & E 15 St 1 Ave & E 18 St ... York St & Jay St
Which stations have more arrivals than departures?
with(station,Var1[which.min(asy)])
## [1] W 33 St & 7 Ave
## 419 Levels: 1 Ave & E 15 St 1 Ave & E 18 St ... York St & Jay St
In the early morning, 8 Ave & 31 St has the greatest asymmetry between departures and arrivals
## [1] "8 Ave & W 31 St"
In the early morning, E 47 St & Park Ave has the greatest asymmetry between arrivals and departures.
## [1] "E 47 St & Park Ave"
In the late morning, “8 Ave & W 31 St” has the greatest asymmetry between departures and arrivals
## [1] "8 Ave & W 31 St"
In the late morning, “E 17 St & Broadway” has the greatest asymmetry between arrivals and departures
## [1] "E 17 St & Broadway"
In the afternoon, “E 27 St & 1 Ave” has the greatest asymmetry between departures and arrivals
## [1] "E 27 St & 1 Ave"
In the afternoon, “W 41 St & 8 Ave” has the greatest asymmetry between arrivals and departures
## [1] "W 41 St & 8 Ave"
In the evening, “Pershing Square South” has the greatest asymmetry between departures and arrivals
## [1] "Pershing Square South"
In the evening, “8 Ave & W 31 St” has the greatest asymmetry between arrivals and departures
## [1] "8 Ave & W 31 St"
In the late evening, “W 20 St & 11 Ave” has the greatest asymmetry between departures and arrivals
## [1] "W 20 St & 11 Ave"
In the late evening, “E 7 St & Avenue A” has the greatest asymmetry between arrivals and departures
## [1] "E 7 St & Avenue A"
On Monday, “8 Ave & W 31 St” has the greatest asymmetry between departures and arrivals
## [1] "8 Ave & W 31 St"
On Monday, “W 33 St & 7 Ave”has the greatest asymmetry between arrivals and departures
## [1] "W 33 St & 7 Ave"
On Tuesday, “8 Ave & W 31 St” has the greatest asymmetry between departures and arrivals
## [1] "8 Ave & W 31 St"
On Tuesday, “E 7 St & Avenue A” has the greatest asymmetry between arrivals and departures
## [1] "E 7 St & Avenue A"
On Wednesday, “8 Ave & W 31 St” has the greatest asymmetry between departures and arrivals
## [1] "8 Ave & W 31 St"
On Wednesday, “E 7 St & Avenue A” has the greatest asymmetry between arrivals and departures
## [1] "W 33 St & 7 Ave"
On Thursday, “8 Ave & W 31 St” has the greatest asymmetry between departures and arrivals
## [1] "8 Ave & W 31 St"
On Thursday, “W 33 St & 7 Ave” has the greatest asymmetry between arrivals and departures
## [1] "W 33 St & 7 Ave"
On Friday, “W 42 St & 8 Ave”has the greatest asymmetry between departures and arrivals
## [1] "W 42 St & 8 Ave"
On Friday, “W 33 St & 7 Ave”has the greatest asymmetry between arrivals and departures
## [1] "W 33 St & 7 Ave"
On Saturday, “W 42 St & 8 Ave”has the greatest asymmetry between departures and arrivals
## [1] "E 32 St & Park Ave"
On Saturday, “Centre St & Chambers St” has the greatest asymmetry between arrivals and departures
## [1] "Centre St & Chambers St"
On Sunday, “8 Ave & W 31 St” has the greatest asymmetry between arrivals and departures
## [1] "8 Ave & W 31 St"
On Saturday, “Central Park S & 6 Ave” has the greatest asymmetry between arrivals and departures
## [1] "Central Park S & 6 Ave"
From Sunday to Monday, it is obvious that “8 Ave & W 31 St” station has the most asymmetric traffic effect for departures than arrivals, and it also holds for the early morning and late morning scenarios. This station has most asymmetric traffic effect for arrivals than departures in the evening
From Sunday to Monday, either “E 7 St & Avenue A” or “W 33 St & 7 Ave” stations has the most arrivals asymmetric traffic effect for than departures.
Data Visualization
## Warning: 'plyr' namespace cannot be unloaded:
## namespace 'plyr' is imported by 'scales', 'ggplot2', 'reshape2', 'ggmap' so cannot be unloaded
## null device
## 1
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Empire+State+Building&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Empire%20State%20Building&sensor=false
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Empire+State+Building&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Empire%20State%20Building&sensor=false

Map of places where people start more than they end

## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Empire+State+Building&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Empire%20State%20Building&sensor=false

Map where they end more than they start

Which stations originate the longest rides? Does this vary by time of day?
Average trip durations based on time of day

Average trip durations by day of the week

Which stations originate the longest rides? Below shows the top 5 stations with longest rides, on average
## Greene Ave & Nostrand Ave Pulaski St & Marcus Garvey Blvd
## 684.2744 538.1000
## Albany Ave & Fulton St Lorimer St & Broadway
## 161.2796 117.2749
## Putnam Ave & Throop Ave
## 108.5185
Does this vary by time of day?
Top 3 Longest Early Morning Trips
## Lexington Ave & Classon Ave Broadway & W 37 St
## 374.9380 290.0227
## 2 Ave & E 31 St
## 126.4951
Top 3 Longest Late Morning Trips
## Pulaski St & Marcus Garvey Blvd Albany Ave & Fulton St
## 1303.10833 184.02222
## Grand St & Havemeyer St
## 59.86568
Top 3 Longest Afternoon Trips
## Greene Ave & Nostrand Ave Albany Ave & Fulton St
## 1135.5907 372.5375
## India St & East River
## 369.7608
Top 3 Longest Evening Trips
## Greene Ave & Nostrand Ave Putnam Ave & Throop Ave
## 595.5000 381.9167
## Driggs Ave & N Henry St
## 109.4211
Longest Late Evening Trips
## Cadman Plaza W & Pierrepont St Bond St & Schermerhorn St
## 205.6383 156.0113
## Bushwick Ave & Powers St
## 155.3396
Count of bike usage by gender

It is important to note that customers do not have the ability to share their gender. As a result, 220,996 points of data are of “Unknown” gender. An additional 1330 subscribers chose not to enter their gender. The remaining breakdown is from the subscribers. The data shows us that males rent a bike on average over 3 times more than females do. Many factors may contribute to this such as gender stereotypes, offices near bike stations, etc. However, none of these factors are measured by citi bike data.
Count of bike usage by birth year

Once again, we only have birth year data for subscribers. The breakdown in birth year shows us that the highest frequency of users comes from individuals born from 1980-1990, or 25-35 year olds. More specifically, we see a spike in usage from individuals who are 23 (1992) years old, in other words new, young working professionals. Then, the greatest spike lies with individuals who are 30 years old (1985), declining in age from that point on. We would be interested to see how this distribution would differ in other cities where other forms of vehicle transportation may be faster than in New York City, since slow vehicle transportation may encourage more individuals of all ages to rent bikes.
Bar chart of user type

This bar graph demonstrates that a large majority of the data comes from subscribers, even in a warm summer vacation month like August. As a result, this means Citi Bike should focus in on the stations and bikes used most frequently by subscribers, since the use would be more continual and frequent.
Frequency of Bike Rental by Weekday and User Type

This data is incredibly insightful. It shows us the breakdown of user type on different weekdays. Looking at customers, we can see a few things. Firstly, the customer count is substantially lower than the user count on each day of the week. This is likely due to the needs of customers, who are more likely to be visitors of NYC rather than locals. Secondly, we see great spikes in use on Saturday and Sunday, likely accounting for all of the vacationers looking to travel around the city on the weekend. The spikes in use account for almost a third of all users on Saturday and Sunday.
The subscriber data is much more interesting. We see that on Mondays, there is a much larger use of bikes than any other weekday. Our analysis attributes this to employees who likely wake up on Monday morning, ready to start off a new week, and eager to be healthier or more environmentally friendly. To achieve these goals, subscribers bike to or from work. However, much like any new goal, people encounter obstacles. Likely, it is easier to drive or take public transportation to work. Those who thought it was challenging to bike on Monday choose not to on Tuesday. The result is an almost 40,000 drop in the number of users on Tuesday. Eventually, this number recovers and use is consistent across the remaining weekdays. Additionally, subscriber use is fairly constant across weekend days as well.
Trip Duration by User Type

These box plots demonstrate the trip duration of each user type. However, this data really only provides us with the greatest outliers. The grand majority of data points appear to have a trip duration of time zero, but we know realistically this cannot be the case. As a result, we need to look at the data without the outliers.
Trip Duration by User Type excluding outliers
## Warning: Removed 8507 rows containing non-finite values (stat_boxplot).

This data provides the bulk of our quartiles, showing that the average trip duration for customers tends to be longer than subscribers. This is particularly interesting because at this point we know there are fewer customers than subscribers. Our group interpreted this data to show the habits of each user type. Customers, likely vacationers, are likely travelling more leisurely and therefore taking a longer time. They are also more likely to be lost. This means their average trip duration is longer. Subscribers on the other hand are likely taking shorter, more frequent trips between places like the train station and their office.
## [1] 7831
